feat: auto-select ONNX Runtime providers (CUDA / CPU) for training, eval, and inference by bnovik0v · Pull Request #70 · livekit/livekit-wakeword

bnovik0v · 2026-04-21T17:39:39Z

The bug

Every ort.InferenceSession in the project is created with providers=["CPUExecutionProvider"] hardcoded, in four places:

src/livekit/wakeword/models/feature_extractor.py:42-44 (Mel)
src/livekit/wakeword/models/feature_extractor.py:101-103 (Embedding)
src/livekit/wakeword/inference/model.py:87-89 (classifier)
src/livekit/wakeword/eval/evaluate.py:197 (eval)

Even if a user installs onnxruntime-gpu on a GPU pod, CUDA is never selected. The Swift package already supports pluggable execution providers (ExecutionProvider.coreML), so this gap is Python-specific.

Impact

Feature extraction during augment is the most ORT-heavy stage — roughly 17 model calls per 2 s clip (1 mel + 16 sliding embedding windows). On an RTX 3090 pod this runs on CPU at ~3 clips/sec. A production config (~150k augmented clips across 6 splits × 2 rounds) takes ~14 hours of CPU time; with CUDA active, the same workload finished in ~13 minutes on the same pod.

The bug also affects WakeWordModel (the listener) and run_eval, both of which silently stay on CPU even when the host has onnxruntime-gpu installed.

The fix

New helper src/livekit/wakeword/_ort_providers.py with a single function:

def get_providers() -> list[str]:
    # 1. LIVEKIT_WAKEWORD_ORT_PROVIDERS env var (comma-separated) wins if set
    # 2. Otherwise intersect ort.get_available_providers() with
    #    (\"CUDAExecutionProvider\", \"CPUExecutionProvider\") — CUDA first, CPU fallback
    # 3. If neither is available, fall through to whatever ORT reports

All four call sites now use it. Behaviour on CPU-only installs is unchanged (plain onnxruntime reports only CPUExecutionProvider).

The env var escape hatch covers:

Forcing CPU on GPU hosts for reproducibility
Opting into CoreML / DirectML / ROCm / TensorRT providers without plumbing a config field

Design decisions:

Helper, not duplicated call-site logic — four copies of provider selection would drift.
Env var, not config field or constructor arg — a config field only helps YAML-driven training, not library consumers of WakeWordModel; a constructor arg would thread through multiple public APIs. Env var works everywhere with zero surface change.
Default preference is CUDA / CPU only — CoreML / DirectML / ROCm / TensorRT are exotic enough to warrant explicit opt-in via the env var.
No warning when CPU is selected on a GPU host — can't cheaply detect "GPU hardware present" from Python without importing torch; users who suspect slowness can check the INFO log for the selected provider list.

Packaging

Adds an optional gpu extra in pyproject.toml:

gpu = [\"onnxruntime-gpu>=1.17\"]

README gets a new "GPU acceleration" subsection documenting the switch:

pip uninstall -y onnxruntime
pip install livekit-wakeword[train,eval,export,gpu]

The uninstall step is required because onnxruntime and onnxruntime-gpu share the Python module name — pip cannot keep them side-by-side. The README also points at ONNX Runtime's GPU compatibility matrix since the onnxruntime-gpu wheel bundles specific CUDA toolkit versions.

What's in this PR

File	Change
`src/livekit/wakeword/_ort_providers.py`	New: `get_providers()` helper
`src/livekit/wakeword/models/feature_extractor.py`	Use helper in `MelSpectrogramFrontend` + `SpeechEmbedding`
`src/livekit/wakeword/inference/model.py`	Use helper in `WakeWordModel.load_model`
`src/livekit/wakeword/eval/evaluate.py`	Use helper in `run_eval`
`pyproject.toml`	New `gpu` optional extra
`README.md`	GPU acceleration subsection in the training install flow
`tests/test_ort_providers.py`	New: 12 tests — env-var override parsing, auto-detection, filtering, logging, real-call-site smoke on `MelSpectrogramFrontend`
`uv.lock`	Regenerated for the new extra

Out of scope (deliberately — these are follow-ups)

Cross-clip batching in run_extraction: features.py processes one clip at a time. Even on GPU this leaves throughput on the table; a batched path would give another large speedup. Worth its own PR — touches the extraction loop, not provider selection.
Splitting augment into augment + extract CLI commands: right now run_augmentation deletes all *_rN.wav files at the top, so if feature extraction fails the user re-runs and loses the augmented clips. A --no-clean flag and/or a separate extract command is a UX concern, not a provider concern.

Verification

uv run ruff check src/livekit/wakeword/_ort_providers.py <touched files>  # clean
uv run ruff format --check <touched files>                                 # clean
uv run mypy src/livekit/wakeword/                                          # same 1 pre-existing error as main
uv run pytest tests/                                                       # 72 passed (60 existing + 12 new)

Reproduction of the original bug:

livekit-wakeword generate configs/prod.yaml
livekit-wakeword augment configs/prod.yaml
# Feature extraction sub-stage runs at ~3 cps on a GPU pod. nvidia-smi shows 0% util.

After this PR, with livekit-wakeword[train,eval,export,gpu] installed (and onnxruntime uninstalled), the same workload saturates the GPU — measured ~200 cps on RTX 3090.

…val, and inference Every ort.InferenceSession in the project is currently created with providers=["CPUExecutionProvider"] hardcoded, in four places: - models/feature_extractor.py:42-44 (Mel) - models/feature_extractor.py:101-103 (Embedding) - inference/model.py:87-89 (classifier) - eval/evaluate.py:197 (eval) Even if a user installs `onnxruntime-gpu` on a GPU pod, CUDA is never selected. Feature extraction during augment (the most ORT-heavy stage — ~17 model calls per clip) runs single-threaded on CPU at ~3 clips/sec on an RTX 3090, making augment the bottleneck of the full pipeline. This patch centralises provider selection in a new `_ort_providers.get_providers()` helper: - Default: intersect `ort.get_available_providers()` with ("CUDAExecutionProvider", "CPUExecutionProvider") — CUDA when the GPU wheel is installed, CPU otherwise. Zero behaviour change for CPU-only installs. - Override: LIVEKIT_WAKEWORD_ORT_PROVIDERS env var (comma-separated) for forcing CPU for reproducibility, or opting into CoreML / DirectML / ROCm / TensorRT. - All four sites now call `get_providers()` instead of hardcoding. Also adds an optional `gpu` extra pinning `onnxruntime-gpu>=1.17`, and a README subsection explaining the `pip uninstall onnxruntime` switch (the CPU and GPU wheels share a Python module name and cannot coexist). Scope limited to provider selection. Cross-clip batching in run_extraction and splitting augment into augment + extract CLI commands are follow-up PRs, not bundled here.

pham-tuan-binh · 2026-06-03T05:44:18Z

The uninstall step for onnxruntime-gpu doesn't look clean to me at the moment. That's the only blocker.

The rest looks fine.

Is there anyway we can seamlessly integrate this into uv flow without having to run uninstall?

onnxruntime was a base dep with onnxruntime-gpu added on top via the gpu extra, so installing [gpu] pulled both wheels (same import path) and the README needed a manual `pip uninstall onnxruntime`. Model the backend as two conflicting extras instead: - remove onnxruntime from base deps; add cpu/gpu extras (floor 1.20) - drop onnxruntime from the export extra (would reintroduce the CPU wheel) - declare [tool.uv] conflicts so `uv sync --extra gpu` swaps the wheel with no manual uninstall - import_ort() raises an actionable error when no backend is installed - README + CI updated for the [cpu]/[gpu] split Breaking: bare `pip install livekit-wakeword` no longer ships a runtime; consumers must pick [cpu] or [gpu].

bnovik0v · 2026-06-04T16:20:32Z

Reworked the packaging to drop the uninstall step. onnxruntime is no longer a base dep — the backend is now a pair of mutually-exclusive extras (cpu → onnxruntime, gpu → onnxruntime-gpu), declared as conflicting via [tool.uv]:

[tool.uv]
conflicts = [[{ extra = "cpu" }, { extra = "gpu" }]]

So in this repo uv sync --extra gpu swaps the wheel with no manual uninstall (verified — it removes onnxruntime and installs onnxruntime-gpu in one step), and asking for both at once is a hard error. This matches uv's own PyTorch cpu/gpu pattern, and is where fastembed landed for the identical "same import path" problem.

Two caveats I want to be upfront about rather than oversell:

[tool.uv].conflicts is not in the published wheel metadata — it's honored when developing in this repo (git clone + uv sync), but a downstream pip install or uv add livekit-wakeword[gpu] doesn't see it. The extras are still mutually exclusive there, just unenforced, so the README keeps a short note for switching an existing install between backends. There's no standard mechanism that fixes the downstream case — onnxruntime/onnxruntime-gpu fundamentally can't coexist and Python packaging has no "provides/conflicts" that survives publication.
This is a breaking change: bare pip install livekit-wakeword no longer ships a runtime, so consumers now pick [cpu] or [gpu]. Added an actionable error (import_ort()) instead of a raw ModuleNotFoundError, and updated the README/CI accordingly. Worth a minor-version bump on release.

The provider auto-selection itself is unchanged. Let me know if you'd rather keep bare install working (would mean keeping onnxruntime in base and living with the uninstall for GPU) — happy to go either way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: auto-select ONNX Runtime providers (CUDA / CPU) for training, eval, and inference#70

feat: auto-select ONNX Runtime providers (CUDA / CPU) for training, eval, and inference#70
bnovik0v wants to merge 2 commits into
livekit:mainfrom
bnovik0v:feat/ort-gpu-providers

bnovik0v commented Apr 21, 2026

Uh oh!

pham-tuan-binh commented Jun 3, 2026

Uh oh!

bnovik0v commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bnovik0v commented Apr 21, 2026

The bug

Impact

The fix

Packaging

What's in this PR

Out of scope (deliberately — these are follow-ups)

Verification

Uh oh!

pham-tuan-binh commented Jun 3, 2026

Uh oh!

bnovik0v commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants